Constrained Hidden Markov Models

نویسنده

  • Sam T. Roweis
چکیده

By thinking of each state in a hidden Markov model as corresponding to some spatial region of a fictitious topology space it is possible to naturally define neighbouring states as those which are connected in that space. The transition matrix can then be constrained to allow transitions only between neighbours; this means that all valid state sequences correspond to connected paths in the topology space. I show how such constrained HMMs can learn to discover underlying structure in complex sequences of high dimensional data, and apply them to the problem of recovering mouth movements from acoustics in continuous speech. 1 Latent variable models for structured sequence data Structured time-series are generated by systems whose underlying state variables change in a continuous way but whose state to output mappings are highly nonlinear, many to one and not smooth. Probabilistic unsupervised learning for such sequences requires models with two essential features: latent (hidden) variables and topology in those variables. Hidden Markov models (HMMs) can be thought of as dynamic generalizations of discrete state static data models such as Gaussian mixtures, or as discrete state versions of linear dynamical systems (LDSs) (which are themselves dynamic generalizations of continuous latent variable models such as factor analysis). While both HMMs and LDSs provide probabilistic latent variable models for time-series, both have important limitations. Traditional HMMs have a very powerful model of the relationship between the underlying state and the associated observations because each state stores a private distribution over the output variables. This means that any change in the hidden state can cause arbitrarily complex changes in the output distribution. However, it is extremely difficult to capture reasonable dynamics on the discrete latent variable because in principle any state is reachable from any other state at any time step and the next state depends only on the current state. LDSs, on the other hand, have an extremely impoverished representation of the outputs as a function of the latent variables since this transformation is restricted to be global and linear. But it is somewhat easier to capture state dynamics since the state is a multidimensional vector of continuous variables on which a matrix “flow” is acting; this enforces some continuity of the latent variables across time. Constrained hidden Markov models address the modeling of state dynamics by building some topology into the hidden state representation. The essential idea is to constrain the transition parameters of a conventional HMM so that the discretevalued hidden state evolves in a structured way.1 In particular, below I consider parameter restrictions which constrain the state to evolve as a discretized version of a continuous multivariate variable, i.e. so that it inscribes only connected paths in some space. This lends a physical interpretation to the discrete state trajectories in an HMM. A standard trick in traditional speech applications of HMMs is to use “left-to-right” transition matrices which are a special case of the type of constraints investigated in this paper. However, leftto-right (Bakis) HMMs force state trajectories that are inherently one-dimensional and uni-directional whereas here I also consider higher dimensional topology and free omni-directional motion. 2 An illustrative game Consider playing the following game: divide a sheet of paper into several contiguous, nonoverlapping regions which between them cover it entirely. In each region inscribe a symbol, allowing symbols to be repeated in different regions. Place a pencil on the sheet and move it around, reading out (in order) the symbols in the regions through which it passes. Add some noise to the observation process so that some fraction of the time incorrect symbols are reported in the list instead of the correct ones. The game is to reconstruct the configuration of regions on the sheet from only such an ordered list(s) of noisy symbols. Of course, the absolute scale, rotation and reflection of the sheet can never be recovered, but learning the essential topology may be possible.2 Figure 1 illustrates this setup. 1 11 24 10 25 17 7 5 9 20 8 6 21 15 22 18 2 16 14 12 19 10 3 23 1 True Generative Map 19 10 3 23 1 18 2 16 14 12 8 6 21 15 22 17 7 5 9 20 1 11 24 10 25 22 9 Iteration:030 logLikelihood:−1.9624 24, 2, 21, 2,... 18, 19, 10, 3,... 2, 2, 16, 16,... 15, 15, 2, 3,... 1, 11, 1, 11,... PPPq Figure 1: (left) True map which generates symbol sequences by random movement between connected cells. (centre) An example noisy output sequence with noisy symbols circled. (right) Learned map after training on 3 sequences (with 15% noise probability) each 200 symbols long. Each cell actually contains an entire distribution over all observed symbols, though in this case only the upper right cell has significant probability mass on more than one symbol (see figure 3 for display details). Without noise or repeated symbols, the game is easy (non-probabilistic methods can solve it) but in their presence it is not. One way of mitigating the noise problem is to do statistical averaging. For example, one could attempt to use the average separation in time of each pair of symbols to define a dissimilarity between them. It then would be possible to use methods like multi-dimensional scaling or a sort of Kohonen mapping though time3 to explicitly construct a configuration of points obeying those distance relations. However, such methods still cannot deal with many-to-one state to output mappings (repeated numbers in the sheet) because by their nature they assign a unique spatial location to each symbol. Playing this game is analogous to doing unsupervised learning on structured sequences. (The game can also be played with continuous outputs, although often high-dimensional data can be effectively clustered around a manageable number of prototypes; thus a vector time-series can be converted into a sequence of symbols.) Constrained HMMs incorporate latent variables with topology yet retain powerful nonlinear output mappings and can deal with the difficulties of noise and many-to-one mappings mentioned above; so they can “win” our game (see figs. 1 & 3). The key insight is that the game generates sequences exactly according to a hidden Markov process whose transition matrix allows only transitions between neighbouring cells and whose output distributions have most of their probability on a single symbol with a small amount on all other symbols to account for noise. The observed symbol sequence must be “informative enough” to reveal the map structure (this can be quantified using the idea of persistent excitation from control theory). Consider a network of units which compete to explain input data points. Each unit has a position in the output space as well as a position in a lower dimensional topology space. The winning unit has its position in output space updated towards the data point; but also the recent (in time) winners have their positions in topology space updated towards the topology space location of the current winner. Such a rule works well, and yields topological maps in which nearby units code for data that typically occur close together in time. However it cannot learn many-to-one maps in which more than one unit at different topology locations have the same (or very similar) outputs. 3 Model definition: state topologies from cell packings Defining a constrained HMM involves identifying each state of the underlying (hidden) Markov chain with a spatial cell in a fictitious topology space. This requires selecting a dimensionality d for the topology space and choosing a packing (such as hexagonal or cubic) which fills the space. The number of cells in the packing is equal to the number of states M in the original Markov model. Cells are taken to be all of equal size and (since the scale of the topology space is completely arbitrary) of unit volume. Thus, the packing covers a volume M in topology space with a side length ` of roughly ` = M1=d. The dimensionality and packing together define a vector-valued function x(m); m = 1 : : :M which gives the location of cell m in the packing. (For example, a cubic packing of d dimensional space defines x(m+1) to be m;m=`;m=`2; : : : ;m=`d 1 mod `.) State m in the Markov model is assigned to to cell m in the packing, thus giving it a location x(m) in the topology space. Finally, we must choose a neighbourhood rule in the topology space which defines the neighbours of cell m; for example, all “connected” cells, all face neighbours, or all those within a certain radius. (For cubic packings, there are 3d-1 connected neighbours and 2d face neighbours in a d dimensional topology space.) The neighbourhood rule also defines the boundary conditions of the space – e.g. periodic boundary conditions would make cells on opposite extreme faces of the space neighbours with each other. The transition matrix of the HMM is now preprogrammed to only allow transitions between neighbours. All other transition probabilities are set to zero, making the transition matrix very sparse. (I have set all permitted transitions to be equally likely.) Now, all valid state sequences in the underlying Markov model represent connected (“city block”) paths through the topology space. Figure 2 illustrates this for a three-dimensional model.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Introducing Busy Customer Portfolio Using Hidden Markov Model

Due to the effective role of Markov models in customer relationship management (CRM), there is a lack of comprehensive literature review which contains all related literatures. In this paper the focus is on academic databases to find all the articles that had been published in 2011 and earlier. One hundred articles were identified and reviewed to find direct relevance for applying Markov models...

متن کامل

Learning Geometrically - Constrained Hidden Markov Models forRobot

You will come to a place where the streets are not marked. Some windows are lighted but mostly they're darked.

متن کامل

مدل سازی فضایی-زمانی وقوع و مقدار بارش زمستانه در گستره ایران با استفاده از مدل مارکف پنهان

Multi site modeling of rainfall is one of the most important issues in environmental sciences especially in watershed management. For this purpose, different statistical models have been developed which involve spatial approaches in simulation and modeling of daily rainfall values. The hidden Markov is one of the multi-site daily rainfall models which in addition to simulation of daily rainfall...

متن کامل

A Constraint Model for Constrained Hidden Markov Models: a First Biological Application⋆

A Hidden Markov Model (HMM) is a common statistical model which is widely used for analysis of biological sequence data and other sequential phenomena. In the present paper we extend HMMs with constraints and show how the familiar Viterbi algorithm can be generalized, based on constraint solving methods. HMMs with constraints have advantages over traditional ones in terms of more compact expres...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 1999